AMENDMENT AND RESPONSE UNDER 37 CFR § 1.111 Docket: P17893 

Serial Number: 10/749,752 
Filing Date: December 30, 2003 

Title: PROTOCOL FOR MAINTAINING CACHE COHERENCY IN A CMP 

Amendment to the Claims: 

1 . (Currently Amended) An apparatus for maintaining cache coherency comprising: 
an integrated circuit including 

a plurality of processor cores, wherein the plurality of processor cores each are 
adapted to be associated with include a private cache; 

a shared cache adapted to be shared by the plurality of processor cores, wherein 
the shared cache includes logic, in response to receiving a write request 
referencing a block from a requesting processor core of the plurality of 
processor cores and the block not being owned, adapted t o generate a first 
message including an invalidation part and a write-acknowledgement part, 
and wherein at least the invalidate part of the first message is to be 
delivered to when rocoivod by a second processor core of the plurality of 
processor cores is-to invalidate the block in the second processor core and 
the at least the write-acknowledgement part of the first message is to only 
be delivered toff , when received by]] the requesting processor coreH* afee 
to act as a write acknowledgement to the requesting processor core; and 

a ring to connect the plurality of processor cores and the shared cache, the ring to 
transmit the first message to the requesting processor core and second 
processor core. 

2. (Canceled) 

3. (Previously Presented) The apparatus of claim 1 wherein the shared cache includes one or 
more banks, wherein the one or more cache banks is responsible for a subset of a physical 
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address space of the system, and wherein the block is associated with a physical address 
of the physical address space of the system. 



4. (Previously Presented) The apparatus of claim 1 wherein the first message includes an 
InvalidateAndAcknowledge message , and wherein the shared cache is to generate the 
InvalidateAndAcknowledge message, further in response to the block being present in the 
shared cache and the second processor core being a custodian for the block. 

5. (Previously Presented) The apparatus of claim 1 wherein the first message includes an 
InvalidateAllAndAcknowledge message, and wherein the shared cache, in response to 
receiving the write request referencing the block from the requesting processor core of 
the plurality of processor cores and the block not being owned, is to generate the 
InvalidateAllAndAcknowledge message, further in response to the block not being 
present in the shared cache and none of the plurality of processor cores being a custodian 
for the block. 

6. (Previously Presented) The apparatus of claim 1 wherein the plurality of processor cores 
writes data through to the shared cache. 

7. (Previously Presented) The apparatus of claim 1 wherein the plurality of processor cores 
each include a merge buffer, and wherein each of the merge buffers are to coalesce 
multiple stores to a same block. 

8. (Previously Presented) The apparatus of claim 1 wherein the shared cache is to fetch a 
second block from a memory and generate a write acknowledge message to provide a 
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write acknowledgement to the requesting processor core in response to receiving a 
second write request referencing the second block, the second block not being present in 
the shared cache and not being owned by any of the plurality of processor cores. 

9. (Previously Presented) The apparatus of claim 8 wherein the shared cache is to generate 
an evict message to evict a third block from an owning processor core and generate a 
second write acknowledge message to provide a second write acknowledgment to the 
requesting processor core in response to receiving a third write request referencing the 
third block, the third block being present in the shared cache and the owning processor 
core of the plurality of cores owns the third block. 

10. (Previously Presented) The apparatus system of claim 1 wherein a bank of the shared 
cache is to be a home location for a non-overlapping portion of a physical address space 
associated with the block. 

1 1 . (Previously Presented) The apparatus of claim 7 wherein each private cache of the 
plurality of cores are not to hold dirty data, and wherein each of the merger buffers are to 
hold the dirty data. 

12. (Previously Presented) The apparatus of claim 1 wherein the ring is a synchronous, 
unbuffered bidirectional ring interconnect. 

13. (Previously Presented) The apparatus of claim 12 wherein the first message has a fixed 
deterministic latency around the ring interconnect. 
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14. (Currently Amended) An apparatus comprising: 

an integrated circuit including: a plurality of cores and a shared memory connected in a 

ring, the shared memory to be accessible by each of the plurality of cores, wherein 
each of the plurality of cores includes a private memory and a merge buffer to 
purge data to the shared memory, and wherein the shared memory includeSi 
receiving logic to receive, from a requesting core of the plurality of cores, a read 

request referencing the address, 
ownership logic to determine an owning processor core of the plurality of 

processor cores owns a block associated with the address, and 
eviction logic coupled to the receiving logic and the ownership logic, the eviction 
logic to generate an evict message referencing the address and the owning 
processor core in response to the receiving logic receiving the read request 
and the ownership logic determining the owning processor core owns the 
block. 

15. (Previously Presented) The apparatus of claim 14, wherein the ring includes a 
synchronous unbuffered bi-directional ring interconnect. 

16. (Previously Presented) The apparatus of claim 14, wherein the shared memory is a shared 
cache including a plurality of blocks, and wherein the shared cache is capable of holding 
each of the plurality of blocks in a cache coherency state. 

17. (Previously Presented) The apparatus of claim 16, wherein the cache coherency state for 
each of the plurality of blocks is selected from a group consisting of (1) a not present 
state, (2) a present and owned by a core of the plurality of cores state, (3) a present, not 
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owned, and custodian is a core of the plurality of cores state, and (4) a present, not 
owned, and no custodian state. 

18. (Currently Amended) A system comprising: 

a processor including: a plurality of cores and a shared memory to be coupled together 
with an unbuffered bi-directional ring interconnect, wherein each of the plurality 
of cores is to be associated with a private cache memory, the shared memory is to 
be accessible by each of the plurality of cores, and the shared memory is to 
include a plurality of blocks, each of the plurality of blocks capable of being held 
in the shared memory by logic in the shared memory in: 
a not present in the shared memory state; 

a present in the shared memory and owned by a core of the plurality of 
cores state; 

a present in the shared memory , not owned, and a core of the plurality of 

cores is a custodian state; and 
a present in the shared memory , not owned, and no core of the plurality 
of cores is a custodian state; and 
a system memory associated with the processor to hold elements to be stored by the 
shared memory. 

19. (Previously Presented) The system of claim 18, wherein each of the plurality of blocks is 
a home location for a subset of a physical address space. 

20. (Previously Presented) The system of claim 19, wherein the shared cache is to generate a 
first message to invalidate a requested block in all cores of the plurality of cores except 
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for a requesting core of the plurality of cores, in response to receiving a write request 
referencing the requested block from the requesting core and requested block being held 
in the present, not owned, and no core of the plurality of cores is a custodian state. 



21. (Currently Amended) A method for maintain cache coherency comprising: 

receiving, with a shared cache, a write request referencing a block from a requesting 
processor core of the plurality of processor cores on a processor, wherein the 
plurality of processor cores each include a private cache, and wherein the plurality 
of cores and the shared cache are connected by a ring interconnect; 
generating a single message, with the shared cache, in response to receiving the write 
request; 

transmitting th e singl e m e ssage on the ring interconnect to at least a second proc e ssor 
cor e of th e plurality of processor cores and to the requesting proc e ssor cor e; 

delivering an invalidation part of the single message to at least the second processor core; 

delivering a write-acknowledgement part of the single message only to the requesting 
processor core; 

invalidating the block in the private cache included in the second processor core in 

response to the second processor core receiving the invalidation part of the single 

message transmitted on the ring interconnect ; and 
write-acknowledging the write request for the requesting processor core in response to the 

requesting processor core receiving the write-acknowledgment part of t he single 

message transmitted on the ring interconnect. 



22. (Previously Presented) The method of claim 21, wherein the shared cache includes one or 
more banks, wherein the one or more cache banks is responsible for a subset of a physical 
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address space of a computer system including the processor, and wherein the block is 
associated with a physical address of the physical address space of the computer system. 



23. (Previously Presented) The method of claim 21 wherein the first message includes an 
InvalidateAndAcknowledge message , and wherein generating the 
InvalidateAndAcknowledge message, with the shared cache, is further in response to the 
block being present in the shared cache and the second processor core being a custodian 
for the block. 

24. (Previously Presented) The method of claim 21 wherein the first message includes an 
InvalidateAllAndAcknowledge message, and wherein generating the 
InvalidateAllAndAcknowledge message, with the shared cache, is further in response to 
the block not being present in the shared cache and none of the plurality of processor 
cores being a custodian for the block. 

25. (Previously Presented) The method of claim 21 wherein the plurality of processor cores 
writes data through to the shared cache. 

26. (Previously Presented) The method of claim 21 wherein the plurality of processor cores 
each include a merge buffer, and wherein each of the merge buffers are to coalesce 
multiple stores to a same block. 

27. (Previously Presented) The method of claim 21, further comprising fetching, with the 
shared memory, a second block from a memory and generating, with the shared memory, 
a write acknowledge message to provide a write acknowledgement to the requesting 
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processor core in response to receiving a second write request referencing the second 
block, the second block not being present in the shared cache and not being owned by 
any of the plurality of processor cores. 

28. (Previously Presented) The method of claim 27 further comprising generating, with the 
shared cache, an evict message to evict a third block from an owning processor core of 
the plurality of processor cores and generating a second write acknowledge message to 
provide a second write acknowledgment to the requesting processor core in response to 
receiving a third write request referencing the third block, the third block being present in 
the shared cache and the owning processor core of the plurality of cores owns the third 
block. 

29. (Previously Presented) The method of claim 21 wherein a bank of the shared cache is to 
be a home location for a non-overlapping portion of a physical address space associated 
with the block. 

30. (Previously Presented) The method of claim 26 wherein each private cache including in 
the plurality of cores are not to hold dirty data, and wherein each of the merger buffers 
are to hold the dirty data. 

31. (Previously Presented) The method of claim 21 wherein the ring interconnect includes a 
synchronous, unbuffered, bidirectional, ring interconnect. 

32. (Previously Presented) The method of claim 21 wherein the first message has a fixed 
deterministic latency around the ring interconnect. 
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